tidymodels: Model Stacking

https://www.tidymodels.org/

Setup

library(tidymodels)
library(stacks)
# tidymodels
# library(rsample)
# library(recipes)
# library(parsnip)
# library(tune)
# library(dials)
# library(workflows)
# library(yardstick)

# install treesnip to allow catboost, lightGBM in parsnip binding
# remotes::install_github("curso-r/treesnip") 
library(treesnip)
library(lightgbm)

library(tidyverse)

library(skimr)
library(patchwork)
library(gt)
library(janitor) # data cleaning

Speed up computation with parrallel processing (optional)

library(doParallel)
all_cores <- parallel::detectCores(logical = FALSE)
registerDoParallel(cores = all_cores)
# set the random seed so we can reproduce any simulated results.
set.seed(1004)
ggplot2::theme_set(theme_light())
library(tidylog, warn.conflicts = FALSE) # load this last!

knitr::opts_chunk$set(message = TRUE, warning = TRUE)

1 Tidy Model Stacking

1.1 Brief Summary of stacks::

stacks:: is an R package for model stacking that aligns with the tidymodels.

Model stacking is an ensembling method that takes the outputs of many models and combines them to generate a new model—referred to as an ensemble in this package — that generates predictions informed by each of its members.

The process goes something like this:

  1. Define candidate ensemble members using functionality from rsample, parsnip, workflows, recipes, and tune
  2. Initialize a data_stack object with stacks()
  3. Iteratively add candidate ensemble members to the data_stack with add_candidates()
  4. Evaluate how to combine their predictions with blend_predictions() and create model_stack
  5. Fit candidate ensemble members with non-zero stacking coefficients with fit_members()
  6. Predict on new data with predict()

You can install the package with the following code:

install.packages("stacks")

Install the (unstable) development version with:

remotes::install_github("tidymodels/stacks", ref = "main")

Rather than diving right into the implementation, we’ll focus here on how the pieces fit together, conceptually, in building an ensemble with stacks.

See Section 2 for an example of how this grammar is implemented!

1.2 Grammar of Model Stacking

At the highest level, ensembles are formed from model definitions. In this package, model definitions are an instance of a minimal workflow, containing a model specification (as defined in the parsnip package) and, optionally, a preprocessor (as defined in the recipes package). Model definitions specify the form of candidate ensemble members.

To be used in the same ensemble, each of these model definitions must share the same resample. This rsample rset object, when paired with the model definitions, can be used to generate the tuning/fitting results objects for the candidate ensemble members with tune.

Candidate members first come together in a data_stack object through the add_candidates() function. Principally, these objects are just tibbles, where the first column gives the true outcome in the assessment set (the portion of the training set used for model validation), and the remaining columns give the predictions from each candidate ensemble member. (When the outcome is numeric, there’s only one column per candidate ensemble member. Classification requires as many columns per candidate as there are levels in the outcome variable.) They also bring along a few extra attributes to keep track of model definitions.

Then, the data stack can be evaluated using blend_predictions() to determine to how best to combine the outputs from each of the candidate members. In the stacking literature, this process is commonly called metalearning.

The outputs of each member are likely highly correlated. Thus, depending on the degree of regularization you choose, the coefficients for the inputs of (possibly) many of the members will zero out—their predictions will have no influence on the final output, and those terms will thus be thrown out.

These stacking coefficients determine which candidate ensemble members will become ensemble members. Candidates with non-zero stacking coefficients are then fitted on the whole training set, altogether making up a model_stack object.

This model stack object, outputted from fit_members(), is ready to predict on new data! The trained ensemble members are often referred to as base models in the stacking literature.

The full visual outline for these steps:

The API for the package closely mirrors these ideas.

Note that a regularized linear model is one of many possible learning algorithms that could be used to fit a stacked ensemble model. For implementations of additional ensemble learning algorithms, see h2o::h2o.stackedEnsemble() and SuperLearner::SuperLearner().




2 Regression Model Stacking

We’ll be working through an example of the workflow of model stacking with the stacks package. At a high level, the workflow looks something like this:

  1. Define candidate ensemble members using functionality from rsample, parsnip, workflows, recipes, and tune
  2. Initialize a data_stack object with stacks()
  3. Iteratively add candidate ensemble members to the data_stack with add_candidates()
  4. Evaluate how to combine their predictions with blend_predictions()
  5. Fit candidate ensemble members with non-zero stacking coefficients with fit_members()
  6. Predict on new data with predict()!

The package is closely integrated with the rest of the functionality in tidymodels—we’ll load those packages as well, in addition to some tidyverse packages to evaluate our results later on.

library(tidymodels)
library(stacks)
# library(tune)
# library(rsample)
# library(parsnip)
# library(workflows)
# library(recipes)
# library(yardstick)

library(tidyverse)
# library(purrr)
# library(dplyr)
# library(tidyr)

In this example, we’ll make use of the tree_frogs data exported with stacks, giving experimental results on hatching behavior of red-eyed tree frog embryos!

Red-eyed tree frog (RETF) embryos can hatch earlier than their normal 7ish days if they detect potential predator threat. Researchers wanted to determine how, and when, these tree frog embryos were able to detect stimulus from their environment. To do so, they subjected the embryos at varying developmental stages to “predator stimulus” by jiggling the embryos with a blunt probe. Beforehand, though some of the embryos were treated with gentamicin, a compound that knocks out their lateral line (a sensory organ.) Researcher Julie Jung and her crew found that these factors inform whether an embryo hatches prematurely or not!

2.1 Dataset

Tree frog embryo hatching data

  • A dataset containing experimental results on hatching behavior of red-eyed tree frog embryos.

Red-eyed tree frog (RETF) embryos can hatch earlier than their normal 7ish days if they detect potential predator threat. Researchers wanted to determine how, and when, these tree frog embryos were able to detect stimulus from their environment. To do so, they subjected the embryos at varying developmental stages to “predator stimulus” by jiggling the embryos with a blunt probe. Beforehand, though some of the embryos were treated with gentamicin, a compound that knocks out their lateral line (a sensory organ.) Researcher Julie Jung and her crew found that these factors inform whether an embryo hatches prematurely or not!

Format

A data frame with 1212 rows and 6 variables:

  • clutch RETFs lay their eggs in gelatinous “clutches” of 30-40 eggs. Eggs with the same clutch ID are siblings of each other! This variable is useful in mixed effects models. (Unordered factor.)

  • treatment The treatment group for the embryo. Either “gentamicin”, a compound that knocks out the embryos’ lateral line, or “control” for the negative control group (i.e. sensory organs intact). (Character.)

  • reflex A measure of ear function called the vestibulo-ocular reflex, categorized into bins. Ear function increases from factor levels “low”, to “mid”, to “full”. (Ordered factor.)

  • age Age of the embryo, in seconds, at the time that the embryo was jiggled. (Numeric, in seconds.)

  • t_o_d The time of day that the stimulus (i.e. jiggle) was applied. “morning” is 5 a.m. to noon, “afternoon” is noon to 8 p.m., and “night” is 8 p.m. to 5 a.m. (Character.)

  • hatched Whether or not the embryo hatched in response to the jiggling! Either “yes” or “no”. (Character.)

  • latency Time elapsed between the stimulus (i.e. jiggling) and hatching in response to the stimulus, in seconds. Missing values indicate that the embryo didn’t hatch in response to the stimulus. (Numeric, in seconds.)

Red-eyed tree frog (RETF) embryos can hatch earlier than their normal 7ish days if they detect potential predator threat. Researchers wanted to determine how, and when, these tree frog embryos were able to detect stimulus from their environment. To do so, they subjected the embryos at varying developmental stages to “predator stimulus” by jiggling the embryos with a blunt probe. Beforehand, though some of the embryos were treated with gentamicin, a compound that knocks out their lateral line (a sensory organ.) Researcher Julie Jung and her crew found that these factors inform whether an embryo hatches prematurely or not!

In this article, we’ll use most all of the variables in tree_frogs to predict reflex, a measure of ear function called the vestibulo-ocular reflex, categorized into bins. Ear function increases from factor levels “low”, to “mid”, to “full”.

data(tree_frogs, package="stacks")
str(tree_frogs)
-->  tibble [1,212 x 7] (S3: tbl_df/tbl/data.frame)
-->   $ clutch   : Factor w/ 62 levels "98","99","100",..: 30 20 23 3 57 2 20 16 3 37 ...
-->   $ treatment: chr [1:1212] "control" "gentamicin" "gentamicin" "control" ...
-->   $ reflex   : Factor w/ 3 levels "low","mid","full": 3 3 3 2 2 1 3 3 2 2 ...
-->   $ age      : num [1:1212] 466965 404310 426220 355360 356535 ...
-->   $ t_o_d    : chr [1:1212] "morning" "afternoon" "night" "night" ...
-->   $ hatched  : chr [1:1212] "yes" "no" "no" "no" ...
-->   $ latency  : num [1:1212] 22 NA NA NA NA 360 NA 106 180 NA ...
-->   - attr(*, "problems")= tibble [4,878 x 5] (S3: tbl_df/tbl/data.frame)
-->    ..$ row     : int [1:4878] 1213 1213 1213 1213 1213 1213 1213 1213 1213 1213 ...
-->    ..$ col     : chr [1:4878] "SUhour" "SUmin" "SUsec" "EpreSU" ...
-->    ..$ expected: chr [1:4878] "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" ...
-->    ..$ actual  : chr [1:4878] "10" "58" "15" "15" ...
-->    ..$ file    : chr [1:4878] "'data-raw/tree_frogs_raw.csv'" "'data-raw/tree_frogs_raw.csv'" "'data-raw/tree_frogs_raw.csv'" "'data-raw/tree_frogs_raw.csv'" ...
DT::datatable(tree_frogs)
skimr::skim(tree_frogs)
Data summary
Name tree_frogs
Number of rows 1212
Number of columns 7
_______________________
Column type frequency:
character 3
factor 2
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
treatment 0 1 7 10 0 2 0
t_o_d 0 1 5 9 0 3 0
hatched 0 1 2 3 0 2 0

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
clutch 0 1 FALSE 62 119: 42, 145: 34, 182: 34, 223: 34
reflex 0 1 FALSE 3 ful: 664, low: 286, mid: 262

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
age 0 1.00 401611.2 42583.6 348405 362225 398510 433860 480120 ▇▂▃▁▅
latency 640 0.47 93.4 70.6 0 47 71 112 442 ▇▃▁▁▁

We’ll start out with predicting latency (i.e. time to hatch) based on other attributes. We’ll need to filter out NAs (i.e. cases where the embryo did not hatch) first.

data(tree_frogs, package="stacks")

# subset the data
tree_frogs <- tree_frogs %>%
  filter(!is.na(latency)) %>%
  dplyr::select(-c(clutch, hatched))
-->      ---->  filter: removed 640 rows (53%), 572 rows remaining
# for the sake of quick typing,
frog <- tree_frogs

Taking a quick look at the data, it seems like the hatch time is pretty closely related to some of our predictors!

library(ggplot2)

ggplot(frog) +
  aes(x = age, y = latency, color = treatment) +
  geom_point() +
  labs(x = "Embryo Age (s)", y = "Time to Hatch (s)", col = "Treatment")

Let’s give this a go!

2.2 Define candidate ensemble members

At the highest level, ensembles are formed from model definitions. In this package, model definitions are an instance of a minimal workflow, containing a model specification (as defined in the parsnip package) and, optionally, a preprocessor (as defined in the recipes package). Model definitions specify the form of candidate ensemble members.

Defining the constituent model definitions is undoubtedly the longest part of building an ensemble with stacks. If you’re familiar with tidymodels “proper,” you’re probably fine to skip this section, keeping a few things in mind:

  • You’ll need to save the assessment set predictions and workflow utilized in your tune_grid(), tune_bayes(), or fit_resamples() objects by setting the control arguments save_pred = TRUE and save_workflow = TRUE. Note the use of the control_stack_*() convenience functions below!
  • Each model definition must share the same rsample rset object.

We’ll first start out with splitting up the training data, generating resamples, and setting some options that will be used by each model definition.

# some setup: resampling and a basic recipe
set.seed(1)
frog_split <- initial_split(frog)
frog_split
-->  <Analysis/Assess/Total>
-->  <429/143/572>
class(frog_split)
-->  [1] "rsplit"   "mc_split"
str(frog_split)
-->  List of 4
-->   $ data  : tibble [572 x 5] (S3: tbl_df/tbl/data.frame)
-->    ..$ treatment: chr [1:572] "control" "control" "control" "control" ...
-->    ..$ reflex   : Factor w/ 3 levels "low","mid","full": 3 1 3 2 3 3 3 3 3 3 ...
-->    ..$ age      : num [1:572] 466965 361180 401595 357810 397440 ...
-->    ..$ t_o_d    : chr [1:572] "morning" "night" "afternoon" "night" ...
-->    ..$ latency  : num [1:572] 22 360 106 180 60 39 214 50 224 63 ...
-->    ..- attr(*, "problems")= tibble [4,878 x 5] (S3: tbl_df/tbl/data.frame)
-->    .. ..$ row     : int [1:4878] 1213 1213 1213 1213 1213 1213 1213 1213 1213 1213 ...
-->    .. ..$ col     : chr [1:4878] "SUhour" "SUmin" "SUsec" "EpreSU" ...
-->    .. ..$ expected: chr [1:4878] "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" ...
-->    .. ..$ actual  : chr [1:4878] "10" "58" "15" "15" ...
-->    .. ..$ file    : chr [1:4878] "'data-raw/tree_frogs_raw.csv'" "'data-raw/tree_frogs_raw.csv'" "'data-raw/tree_frogs_raw.csv'" "'data-raw/tree_frogs_raw.csv'" ...
-->   $ in_id : int [1:429] 2 3 4 5 6 7 8 9 10 11 ...
-->   $ out_id: logi NA
-->   $ id    : tibble [1 x 1] (S3: tbl_df/tbl/data.frame)
-->    ..$ id: chr "Resample1"
-->   - attr(*, "class")= chr [1:2] "rsplit" "mc_split"
frog_train <- training(frog_split)
frog_test  <- testing(frog_split)
class(frog_train)
-->  [1] "tbl_df"     "tbl"        "data.frame"
set.seed(1)
frog_folds <- rsample::vfold_cv(frog_train, v = 5)
class(frog_folds)
-->  [1] "vfold_cv"   "rset"       "tbl_df"     "tbl"        "data.frame"
frog_folds
-->  #  5-fold cross-validation 
-->  # A tibble: 5 x 2
-->    splits           id   
-->    <list>           <chr>
-->  1 <split [343/86]> Fold1
-->  2 <split [343/86]> Fold2
-->  3 <split [343/86]> Fold3
-->  4 <split [343/86]> Fold4
-->  5 <split [344/85]> Fold5
frog_rec <- 
  recipe(latency ~ ., data = frog_train)
class(frog_rec)
-->  [1] "recipe"
frog_rec
-->  Data Recipe
-->  
-->  Inputs:
-->  
-->        role #variables
-->     outcome          1
-->   predictor          4
frog_metric <- yardstick::metric_set(rmse)
class(frog_metric)
-->  [1] "numeric_metric_set" "metric_set"         "function"
frog_metric
-->    metric          class direction
-->  1   rmse numeric_metric  minimize

Tuning and fitting results for use in ensembles need to be fitted with the control arguments save_pred = TRUE and save_workflow = TRUE—these settings ensure that the assessment set predictions, as well as the workflow used to fit the resamples, are stored in the resulting object. For convenience, stacks supplies some control_stack_*() functions to generate the appropriate objects for you.

In this example, we’ll be working with tune_grid() and fit_resamples() from the tune package, so we will use the following control settings:

ctrl_grid <- stacks::control_stack_grid()
ctrl_grid
-->  grid/resamples control object
ctrl_res  <- stacks::control_stack_resamples()
ctrl_res
-->  grid/resamples control object

We’ll define three different model definitions to try to predict time to hatch—a K-nearest neighbors model (with hyperparameters to tune), a linear model, and a support vector machine model (again, with hyperparameters to tune).

Starting out with K-nearest neighbors, we begin by creating a parsnip model specification:

# create a model definition
knn_spec <-
  nearest_neighbor(
    mode = "regression", 
    neighbors = tune("k")
  ) %>%
  set_engine("kknn")

class(knn_spec)
-->  [1] "nearest_neighbor" "model_spec"
knn_spec
-->  K-Nearest Neighbor Model Specification (regression)
-->  
-->  Main Arguments:
-->    neighbors = tune("k")
-->  
-->  Computational engine: kknn

Note that, since we are tuning over several possible numbers of neighbors, this model specification defines multiple model configurations. The specific form of those configurations will be determined when specifying the grid search in tune_grid().

From here, we extend the basic recipe defined earlier to fully specify the form of the design matrix for use in a K-nearest neighbors model:

# extend the recipe
knn_rec <-
  frog_rec %>%
  step_dummy(all_nominal()) %>%
  step_zv(all_predictors(), skip = TRUE) %>%
  step_meanimpute(all_numeric(), skip = TRUE) %>%
  step_normalize(all_numeric(), skip = TRUE)

class(frog_rec)
-->  [1] "recipe"
frog_rec
-->  Data Recipe
-->  
-->  Inputs:
-->  
-->        role #variables
-->     outcome          1
-->   predictor          4
class(knn_rec)
-->  [1] "recipe"
knn_rec
-->  Data Recipe
-->  
-->  Inputs:
-->  
-->        role #variables
-->     outcome          1
-->   predictor          4
-->  
-->  Operations:
-->  
-->  Dummy variables from all_nominal()
-->  Zero variance filter on all_predictors()
-->  Mean Imputation for all_numeric()
-->  Centering and scaling for all_numeric()

Starting with the basic recipe, we convert categorical variables to dummy variables, remove column with only one observation, impute missing values in numeric variables using the mean, and normalize numeric predictors. Pre-processing instructions for the remaining models are defined similarly.

Now, we combine the model specification and pre-processing instructions defined above to form a workflow object:

# add both to a workflow
knn_wflow <- 
  workflow() %>% 
  workflows::add_model(knn_spec) %>%
  workflows::add_recipe(knn_rec)

class(knn_wflow)
-->  [1] "workflow"
knn_wflow
-->  == Workflow ====================================================================================================================================================================================================================================================================================================================================================================================================
-->  Preprocessor: Recipe
-->  Model: nearest_neighbor()
-->  
-->  -- Preprocessor ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-->  4 Recipe Steps
-->  
-->  * step_dummy()
-->  * step_zv()
-->  * step_meanimpute()
-->  * step_normalize()
-->  
-->  -- Model -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-->  K-Nearest Neighbor Model Specification (regression)
-->  
-->  Main Arguments:
-->    neighbors = tune("k")
-->  
-->  Computational engine: kknn

Finally, we can make use of the workflow, training set resamples, metric set, and control object to tune our hyperparameters. Using the grid argument, we specify that we would like to optimize over four possible values of k using a grid search.

# tune k and fit to the 5-fold cv
set.seed(2020)
knn_res <- 
  tune::tune_grid(
    knn_wflow,
    resamples = frog_folds,
    metrics = frog_metric,
    grid = 4,
    control = ctrl_grid
  )
-->  The workflow being saved contains a recipe, which is 0.23 Mb in memory. If this was not intentional, please set the control setting `save_workflow = FALSE`.
class(knn_res)
-->  [1] "tune_results" "tbl_df"       "tbl"          "data.frame"
knn_res
-->  # Tuning results
-->  # 5-fold cross-validation 
-->  # A tibble: 5 x 5
-->    splits           id    .metrics         .notes           .predictions      
-->    <list>           <chr> <list>           <list>           <list>            
-->  1 <split [343/86]> Fold1 <tibble [4 x 5]> <tibble [0 x 1]> <tibble [344 x 5]>
-->  2 <split [343/86]> Fold2 <tibble [4 x 5]> <tibble [0 x 1]> <tibble [344 x 5]>
-->  3 <split [343/86]> Fold3 <tibble [4 x 5]> <tibble [0 x 1]> <tibble [344 x 5]>
-->  4 <split [343/86]> Fold4 <tibble [4 x 5]> <tibble [0 x 1]> <tibble [344 x 5]>
-->  5 <split [344/85]> Fold5 <tibble [4 x 5]> <tibble [0 x 1]> <tibble [340 x 5]>
knn_res[[".metrics"]][[1]]
-->  # A tibble: 4 x 5
-->        k .metric .estimator .estimate .config             
-->    <int> <chr>   <chr>          <dbl> <chr>               
-->  1     3 rmse    standard        125. Preprocessor1_Model1
-->  2     5 rmse    standard        125. Preprocessor1_Model2
-->  3    11 rmse    standard        125. Preprocessor1_Model3
-->  4    12 rmse    standard        125. Preprocessor1_Model4

This knn_res object fully specifies the candidate members, and is ready to be included in a stacks workflow.

Now, specifying the linear model, note that we are not optimizing over any hyperparameters. Thus, we use the fit_resamples() function rather than tune_grid() or tune_bayes() when fitting to our resamples.

# create a model definition
lin_reg_spec <-
  linear_reg() %>%
  set_engine("lm")

class(lin_reg_spec)
-->  [1] "linear_reg" "model_spec"
lin_reg_spec
-->  Linear Regression Model Specification (regression)
-->  
-->  Computational engine: lm
# extend the recipe
lin_reg_rec <-
  frog_rec %>%
  step_dummy(all_nominal()) %>%
  step_zv(all_predictors(), skip = TRUE)

lin_reg_rec
-->  Data Recipe
-->  
-->  Inputs:
-->  
-->        role #variables
-->     outcome          1
-->   predictor          4
-->  
-->  Operations:
-->  
-->  Dummy variables from all_nominal()
-->  Zero variance filter on all_predictors()
# add both to a workflow
lin_reg_wflow <- 
  workflow() %>%
  add_model(lin_reg_spec) %>%
  add_recipe(lin_reg_rec)

# fit to the 5-fold cv
set.seed(2020)
lin_reg_res <- 
  fit_resamples(
    lin_reg_wflow,
    resamples = frog_folds,
    metrics = frog_metric,
    control = ctrl_res
  )
-->  The workflow being saved contains a recipe, which is 0.22 Mb in memory. If this was not intentional, please set the control setting `save_workflow = FALSE`.
lin_reg_res
-->  # Resampling results
-->  # 5-fold cross-validation 
-->  # A tibble: 5 x 5
-->    splits           id    .metrics         .notes           .predictions     
-->    <list>           <chr> <list>           <list>           <list>           
-->  1 <split [343/86]> Fold1 <tibble [1 x 4]> <tibble [0 x 1]> <tibble [86 x 4]>
-->  2 <split [343/86]> Fold2 <tibble [1 x 4]> <tibble [0 x 1]> <tibble [86 x 4]>
-->  3 <split [343/86]> Fold3 <tibble [1 x 4]> <tibble [0 x 1]> <tibble [86 x 4]>
-->  4 <split [343/86]> Fold4 <tibble [1 x 4]> <tibble [0 x 1]> <tibble [86 x 4]>
-->  5 <split [344/85]> Fold5 <tibble [1 x 4]> <tibble [0 x 1]> <tibble [85 x 4]>

Finally, putting together the model definition for the support vector machine:

# create a model definition
svm_spec <- 
  svm_rbf(
    cost = tune("cost"), 
    rbf_sigma = tune("sigma")
  ) %>%
  set_engine("kernlab") %>%
  set_mode("regression")

# extend the recipe
svm_rec <-
  frog_rec %>%
  step_dummy(all_nominal()) %>%
  step_zv(all_predictors(), skip = TRUE) %>%
  step_meanimpute(all_numeric(), skip = TRUE) %>%
  step_corr(all_predictors(), skip = TRUE) %>%
  step_normalize(all_numeric(), skip = TRUE)

# add both to a workflow
svm_wflow <- 
  workflow() %>% 
  add_model(svm_spec) %>%
  add_recipe(svm_rec)

# tune cost and sigma and fit to the 5-fold cv
set.seed(2020)
svm_res <- 
  tune_grid(
    svm_wflow, 
    resamples = frog_folds, 
    grid = 6,
    metrics = frog_metric,
    control = ctrl_grid
  )
-->  The workflow being saved contains a recipe, which is 0.23 Mb in memory. If this was not intentional, please set the control setting `save_workflow = FALSE`.
svm_res
-->  # Tuning results
-->  # 5-fold cross-validation 
-->  # A tibble: 5 x 5
-->    splits           id    .metrics         .notes           .predictions      
-->    <list>           <chr> <list>           <list>           <list>            
-->  1 <split [343/86]> Fold1 <tibble [6 x 6]> <tibble [0 x 1]> <tibble [516 x 6]>
-->  2 <split [343/86]> Fold2 <tibble [6 x 6]> <tibble [0 x 1]> <tibble [516 x 6]>
-->  3 <split [343/86]> Fold3 <tibble [6 x 6]> <tibble [0 x 1]> <tibble [516 x 6]>
-->  4 <split [343/86]> Fold4 <tibble [6 x 6]> <tibble [0 x 1]> <tibble [516 x 6]>
-->  5 <split [344/85]> Fold5 <tibble [6 x 6]> <tibble [0 x 1]> <tibble [510 x 6]>

Altogether, we’ve created three model definitions, where the K-nearest neighbors model definition specifies 4 model configurations, the linear regression specifies 1, and the support vector machine specifies 6.

With these three model definitions fully specified, we are ready to begin stacking these model configurations. (Note that, in most applied settings, one would likely specify many more than 11 candidate members.)

2.3 Putting together a stack

The first step to building an ensemble with stacks is to create a data_stack object—in this package, data stacks are tibbles (with some extra attributes) that contain the assessment set predictions for each candidate ensemble member.

We can initialize a data stack using the stacks() function.

stacks()
-->  # A data stack with 0 model definitions and 0 candidate members.

The stacks() function works sort of like the ggplot() constructor from ggplot2—the function creates a basic structure that the object will be built on top of—except you’ll pipe the outputs rather than adding them with +.

The add_candidates() function adds ensemble members to the stack.

frog_data_st <- 
  stacks() %>%
  add_candidates(knn_res) %>%
  add_candidates(lin_reg_res) %>%
  add_candidates(svm_res)

class(frog_data_st)
-->  [1] "data_stack" "tbl_df"     "tbl"        "data.frame"
frog_data_st
-->  # A data stack with 3 model definitions and 11 candidate members:
-->  #   knn_res: 4 model configurations
-->  #   lin_reg_res: 1 model configuration
-->  #   svm_res: 6 model configurations
-->  # Outcome: latency (numeric)

As mentioned before, under the hood, a data_stack object is really just a tibble with some extra attributes. Checking out the actual data:

as_tibble(frog_data_st)
-->  # A tibble: 429 x 12
-->     latency knn_res_1_1 knn_res_1_2 knn_res_1_3 knn_res_1_4 lin_reg_res_1_1 svm_res_1_1 svm_res_1_4 svm_res_1_3 svm_res_1_5 svm_res_1_2 svm_res_1_6
-->       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>           <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
-->   1     360      -0.343      -0.395      -0.469      -0.478           194.       -0.327      -0.327     -0.0673      0.112       -0.113      -0.325
-->   2     106      -0.336      -0.390      -0.438      -0.444           123.       -0.317      -0.317     -0.0730      0.0876      -0.130      -0.315
-->   3     180      -0.343      -0.395      -0.469      -0.478           138.       -0.327      -0.327     -0.0673      0.112       -0.113      -0.325
-->   4      60      -0.350      -0.363      -0.401      -0.407           122.       -0.324      -0.324     -0.0673      0.0911      -0.113      -0.321
-->   5      39      -0.251      -0.310      -0.427      -0.441            82.9      -0.327      -0.327     -0.0673      0.112       -0.113      -0.325
-->   6     214      -0.420      -0.436      -0.505      -0.515           134.       -0.350      -0.350     -0.0867      0.127       -0.145      -0.347
-->   7      50      -0.336      -0.390      -0.438      -0.444            37.2      -0.317      -0.317     -0.0730      0.0876      -0.130      -0.315
-->   8     224      -0.336      -0.390      -0.438      -0.444           125.       -0.317      -0.317     -0.0730      0.0876      -0.130      -0.315
-->   9      63      -0.420      -0.436      -0.505      -0.515            40.3      -0.350      -0.350     -0.0867      0.127       -0.145      -0.347
-->  10      33      -0.336      -0.390      -0.438      -0.444            38.3      -0.317      -0.317     -0.0730      0.0876      -0.130      -0.315
-->  # ... with 419 more rows

The first column gives the first response value, and the remaining columns give the assessment set predictions for each ensemble member. Since we’re in the regression case, there’s only one column per ensemble member. In classification settings, there are as many columns as there are levels of the outcome variable per candidate ensemble member.

That’s it! We’re now ready to evaluate how it is that we need to combine predictions from each candidate ensemble member.

2.4 Fit the stack

The outputs from each of these candidate ensemble members are highly correlated, so the blend_predictions() function performs regularization to figure out how we can combine the outputs from the stack members to come up with a final prediction.

frog_model_st <-
  frog_data_st %>%
  stacks::blend_predictions()

class(frog_model_st)
-->  [1] "linear_stack" "model_stack"  "list"
frog_model_st
-->  -- A stacked ensemble model -------------------------------------
-->  
-->  Out of 11 possible candidate members, the ensemble retained 3.
-->  Lasso penalty: 0.1.
-->  
-->  The 3 highest weighted members are:
-->  # A tibble: 3 x 3
-->    member          type              weight
-->    <chr>           <chr>              <dbl>
-->  1 svm_res_1_3     svm_rbf          583.   
-->  2 knn_res_1_1     nearest_neighbor   5.13 
-->  3 lin_reg_res_1_1 linear_reg         0.935
-->  
-->  Members have not yet been fitted with `fit_members()`.

The blend_predictions function determines how member model output will ultimately be combined in the final prediction by fitting a LASSO model on the data stack, predicting the true assessment set outcome using the predictions from each of the candidate members. Candidates with nonzero stacking coefficients become members.

To make sure that we have the right trade-off between minimizing the number of members and optimizing performance, we can use the autoplot() method:

theme_set(theme_bw())
autoplot(frog_model_st)

To show the relationship more directly:

autoplot(frog_model_st, type = "members")

If these results were not good enough, blend_predictions() could be called again with different values of penalty. As it is, blend_predictions() picks the penalty parameter with the numerically optimal results. To see the top results:

autoplot(frog_model_st, type = "weights")

Now that we know how to combine our model output, we can fit the candidates with non-zero stacking coefficients on the full training set.

frog_model_st
-->  -- A stacked ensemble model -------------------------------------
-->  
-->  Out of 11 possible candidate members, the ensemble retained 3.
-->  Lasso penalty: 0.1.
-->  
-->  The 3 highest weighted members are:
-->  # A tibble: 3 x 3
-->    member          type              weight
-->    <chr>           <chr>              <dbl>
-->  1 svm_res_1_3     svm_rbf          583.   
-->  2 knn_res_1_1     nearest_neighbor   5.13 
-->  3 lin_reg_res_1_1 linear_reg         0.935
-->  
-->  Members have not yet been fitted with `fit_members()`.
frog_model_st <-
  frog_model_st %>%
  stacks::fit_members()

frog_model_st
-->  -- A stacked ensemble model -------------------------------------
-->  
-->  Out of 11 possible candidate members, the ensemble retained 3.
-->  Lasso penalty: 0.1.
-->  
-->  The 3 highest weighted members are:
-->  # A tibble: 3 x 3
-->    member          type              weight
-->    <chr>           <chr>              <dbl>
-->  1 svm_res_1_3     svm_rbf          583.   
-->  2 knn_res_1_1     nearest_neighbor   5.13 
-->  3 lin_reg_res_1_1 linear_reg         0.935

Model stacks can be thought of as a group of fitted member models and a set of instructions on how to combine their predictions.

To identify which model configurations were assigned what stacking coefficients, we can make use of the collect_parameters() function:

stacks::collect_parameters(frog_model_st, "lin_reg_res")
-->  # A tibble: 1 x 2
-->    member           coef
-->    <chr>           <dbl>
-->  1 lin_reg_res_1_1 0.935
stacks::collect_parameters(frog_model_st, "knn_res")
-->  # A tibble: 4 x 3
-->    member          k  coef
-->    <chr>       <int> <dbl>
-->  1 knn_res_1_1     3  5.13
-->  2 knn_res_1_2     5  0   
-->  3 knn_res_1_3    11  0   
-->  4 knn_res_1_4    12  0
stacks::collect_parameters(frog_model_st, "svm_res")
-->  # A tibble: 6 x 4
-->    member         cost    sigma  coef
-->    <chr>         <dbl>    <dbl> <dbl>
-->  1 svm_res_1_1 0.00143 6.64e- 9    0 
-->  2 svm_res_1_2 3.59    3.95e- 4    0 
-->  3 svm_res_1_3 0.0978  1.81e- 2  583.
-->  4 svm_res_1_4 0.00849 2.16e-10    0 
-->  5 svm_res_1_5 0.256   4.54e- 1    0 
-->  6 svm_res_1_6 7.64    3.16e- 7    0

This object is now ready to predict with new data!

frog_test <- 
  frog_test %>%
  bind_cols(predict(frog_model_st, .))

Juxtaposing the predictions with the true data:

ggplot(frog_test) +
  aes(x = latency, 
      y = .pred) +
  geom_point() + 
  coord_obs_pred()

Looks like our predictions were pretty strong! How do the stacks predictions perform, though, as compared to the members’ predictions? We can use the type = "members" argument to generate predictions from each of the ensemble members.

member_preds <- 
  frog_test %>%
  dplyr::select(latency) %>%
  bind_cols(predict(frog_model_st, frog_test, members = TRUE))

member_preds
-->  # A tibble: 143 x 5
-->     latency .pred knn_res_1_1 lin_reg_res_1_1 svm_res_1_3
-->       <dbl> <dbl>       <dbl>           <dbl>       <dbl>
-->   1      22  50.1      -0.344            38.0     -0.0600
-->   2      83 131.       -0.344           124.      -0.0600
-->   3      58  93.5      -0.344            84.5     -0.0600
-->   4      39  47.6      -0.344            35.3     -0.0600
-->   5      68  88.9      -0.253            79.0     -0.0600
-->   6      31  92.9      -0.344            83.7     -0.0600
-->   7      58 125.       -0.344           118.      -0.0600
-->   8     106  89.8      -0.253            80.0     -0.0600
-->   9      21  48.7      -0.344            36.5     -0.0600
-->  10      30  89.3      -0.253            79.4     -0.0600
-->  # ... with 133 more rows

Now, evaluating the root mean squared error from each model:

yardstick::rmse(member_preds, truth=latency, estimate=.pred)
-->  # A tibble: 1 x 3
-->    .metric .estimator .estimate
-->    <chr>   <chr>          <dbl>
-->  1 rmse    standard        46.5
map_dfr(member_preds, rmse, truth = latency, data = member_preds) %>%
  mutate(member = colnames(member_preds))
-->      ---->  mutate: new variable 'member' (character) with 5 unique values and 0% NA
-->  # A tibble: 5 x 4
-->    .metric .estimator .estimate member         
-->    <chr>   <chr>          <dbl> <chr>          
-->  1 rmse    standard         0   latency        
-->  2 rmse    standard        46.5 .pred          
-->  3 rmse    standard       104.  knn_res_1_1    
-->  4 rmse    standard        44.7 lin_reg_res_1_1
-->  5 rmse    standard       103.  svm_res_1_3

As we can see, the stacked ensemble outperforms each of the member models, though is closely followed by one of its members.

Voila! You’ve now made use of the stacks package to predict red-eyed tree frog embryo hatching using a stacked ensemble!




3 Classification Model Stacking

In this vignette, we’ll tackle a multiclass classification problem using the stacks package. This vignette assumes that you’re familiar with tidymodels “proper,” as well as the basic grammar of the package, and have seen it implemented on numeric data; if this is not the case, check out the “Getting Started With stacks” vignette!

library(tidymodels)
library(stacks)
# library(tune)
# library(rsample)
# library(parsnip)
# library(workflows)
# library(recipes)
# library(yardstick)

library(tidyverse)
# library(purrr)
# library(dplyr)
# library(tidyr)

3.1 Dataset

Tree frog embryo hatching data

  • A dataset containing experimental results on hatching behavior of red-eyed tree frog embryos.

Red-eyed tree frog (RETF) embryos can hatch earlier than their normal 7ish days if they detect potential predator threat. Researchers wanted to determine how, and when, these tree frog embryos were able to detect stimulus from their environment. To do so, they subjected the embryos at varying developmental stages to “predator stimulus” by jiggling the embryos with a blunt probe. Beforehand, though some of the embryos were treated with gentamicin, a compound that knocks out their lateral line (a sensory organ.) Researcher Julie Jung and her crew found that these factors inform whether an embryo hatches prematurely or not!

Format

A data frame with 1212 rows and 6 variables:

  • clutch RETFs lay their eggs in gelatinous “clutches” of 30-40 eggs. Eggs with the same clutch ID are siblings of each other! This variable is useful in mixed effects models. (Unordered factor.)

  • treatment The treatment group for the embryo. Either “gentamicin”, a compound that knocks out the embryos’ lateral line, or “control” for the negative control group (i.e. sensory organs intact). (Character.)

  • reflex A measure of ear function called the vestibulo-ocular reflex, categorized into bins. Ear function increases from factor levels “low”, to “mid”, to “full”. (Ordered factor.)

  • age Age of the embryo, in seconds, at the time that the embryo was jiggled. (Numeric, in seconds.)

  • t_o_d The time of day that the stimulus (i.e. jiggle) was applied. “morning” is 5 a.m. to noon, “afternoon” is noon to 8 p.m., and “night” is 8 p.m. to 5 a.m. (Character.)

  • hatched Whether or not the embryo hatched in response to the jiggling! Either “yes” or “no”. (Character.)

  • latency Time elapsed between the stimulus (i.e. jiggling) and hatching in response to the stimulus, in seconds. Missing values indicate that the embryo didn’t hatch in response to the stimulus. (Numeric, in seconds.)

Red-eyed tree frog (RETF) embryos can hatch earlier than their normal 7ish days if they detect potential predator threat. Researchers wanted to determine how, and when, these tree frog embryos were able to detect stimulus from their environment. To do so, they subjected the embryos at varying developmental stages to “predator stimulus” by jiggling the embryos with a blunt probe. Beforehand, though some of the embryos were treated with gentamicin, a compound that knocks out their lateral line (a sensory organ.) Researcher Julie Jung and her crew found that these factors inform whether an embryo hatches prematurely or not!

In this article, we’ll use most all of the variables in tree_frogs to predict reflex, a measure of ear function called the vestibulo-ocular reflex, categorized into bins. Ear function increases from factor levels “low”, to “mid”, to “full”.

data(tree_frogs, package="stacks")
str(tree_frogs)
-->  tibble [1,212 x 7] (S3: tbl_df/tbl/data.frame)
-->   $ clutch   : Factor w/ 62 levels "98","99","100",..: 30 20 23 3 57 2 20 16 3 37 ...
-->   $ treatment: chr [1:1212] "control" "gentamicin" "gentamicin" "control" ...
-->   $ reflex   : Factor w/ 3 levels "low","mid","full": 3 3 3 2 2 1 3 3 2 2 ...
-->   $ age      : num [1:1212] 466965 404310 426220 355360 356535 ...
-->   $ t_o_d    : chr [1:1212] "morning" "afternoon" "night" "night" ...
-->   $ hatched  : chr [1:1212] "yes" "no" "no" "no" ...
-->   $ latency  : num [1:1212] 22 NA NA NA NA 360 NA 106 180 NA ...
-->   - attr(*, "problems")= tibble [4,878 x 5] (S3: tbl_df/tbl/data.frame)
-->    ..$ row     : int [1:4878] 1213 1213 1213 1213 1213 1213 1213 1213 1213 1213 ...
-->    ..$ col     : chr [1:4878] "SUhour" "SUmin" "SUsec" "EpreSU" ...
-->    ..$ expected: chr [1:4878] "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" ...
-->    ..$ actual  : chr [1:4878] "10" "58" "15" "15" ...
-->    ..$ file    : chr [1:4878] "'data-raw/tree_frogs_raw.csv'" "'data-raw/tree_frogs_raw.csv'" "'data-raw/tree_frogs_raw.csv'" "'data-raw/tree_frogs_raw.csv'" ...
DT::datatable(tree_frogs)
skimr::skim(tree_frogs)
Data summary
Name tree_frogs
Number of rows 1212
Number of columns 7
_______________________
Column type frequency:
character 3
factor 2
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
treatment 0 1 7 10 0 2 0
t_o_d 0 1 5 9 0 3 0
hatched 0 1 2 3 0 2 0

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
clutch 0 1 FALSE 62 119: 42, 145: 34, 182: 34, 223: 34
reflex 0 1 FALSE 3 ful: 664, low: 286, mid: 262

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
age 0 1.00 401611.2 42583.6 348405 362225 398510 433860 480120 ▇▂▃▁▅
latency 640 0.47 93.4 70.6 0 47 71 112 442 ▇▃▁▁▁
# subset the data
tree_frogs <- tree_frogs %>%
  dplyr::select(-c(clutch, latency))
# for the sake of quick typing,
pepe <- tree_frogs

Let’s plot the data to get a sense for how separable these groups are.

ggplot(pepe) +
  aes(x = treatment, y = age, color = reflex) +
  geom_jitter() +
  labs(y = "Embryo Age (s)", 
       x = "treatment",
       color = "Response")

It looks like the embryo age is pretty effective at picking out embryos with full VOR function, but the problem gets tougher for the less developed embryos! Let’s see how well the stacked ensemble can classify these tree frogs.

3.2 Defining candidate ensemble members

As in the numeric prediction setting, defining the candidate ensemble members is undoubtedly the longest part of the ensembling process with stacks. First, splitting up the training data, generating resamples, and setting some options that will be used by each model definition.

# some setup: resampling and a basic recipe
set.seed(1)

pepe_split <- initial_split(pepe)
pepe_train <- training(pepe_split)
pepe_test  <- testing(pepe_split)

pepe_folds <- rsample::vfold_cv(pepe_train, v = 5)

pepe_rec <- 
  recipe(reflex ~ ., data = pepe_train) %>%
  step_dummy(all_nominal(), -reflex) %>%
  step_zv(all_predictors())

pepe_wflow <- 
  workflow() %>% 
  add_recipe(pepe_rec)

We also need to use the same control settings as in the numeric response setting:

ctrl_grid <- control_stack_grid()

We’ll define two different model definitions to try to predict reflex—a random forest and a neural network.

Starting out with a random forest:

rand_forest_spec <- 
  rand_forest(
    mtry = tune(),
    min_n = tune(),
    trees = 500
  ) %>%
  set_mode("classification") %>%
  set_engine("ranger")

rand_forest_wflow <-
  pepe_wflow %>%
  add_model(rand_forest_spec)

rand_forest_res <- 
  tune_grid(
    object = rand_forest_wflow, 
    resamples = pepe_folds, 
    grid = 10,
    control = ctrl_grid
  )

Now, moving on to the neural network model definition:

nnet_spec <-
  mlp(hidden_units = tune(), penalty = tune(), epochs = tune()) %>%
  set_mode("classification") %>%
  set_engine("nnet")

nnet_rec <- 
  pepe_rec %>% 
  step_normalize(all_predictors())

nnet_wflow <- 
  pepe_wflow %>%
  add_model(nnet_spec)

nnet_res <-
  tune_grid(
    object = nnet_wflow, 
    resamples = pepe_folds, 
    grid = 10,
    control = ctrl_grid
  )

With these model definitions fully specified, we’re ready to start putting together an ensemble!

3.3 Putting together a stack

Building the stacked ensemble, now, only takes a few lines:

pepe_model_st <- 
  stacks() %>%                             # initialize the stack
  add_candidates(rand_forest_res) %>%      # add candidate members
  add_candidates(nnet_res) %>%
  blend_predictions() %>%                  # determine how to combine their predictions
  fit_members()                            # fit the candidates with nonzero stacking coefficients

pepe_model_st
-->  # A tibble: 7 x 4
-->    member                          type           weight class
-->    <chr>                           <chr>           <dbl> <chr>
-->  1 .pred_full_rand_forest_res_1_01 rand_forest 3.69      full 
-->  2 .pred_mid_rand_forest_res_1_05  rand_forest 0.944     mid  
-->  3 .pred_full_rand_forest_res_1_09 rand_forest 0.435     full 
-->  4 .pred_full_rand_forest_res_1_02 rand_forest 0.0857    full 
-->  5 .pred_full_rand_forest_res_1_05 rand_forest 0.0188    full 
-->  6 .pred_full_rand_forest_res_1_04 rand_forest 0.0000579 full 
-->  7 .pred_full_rand_forest_res_1_03 rand_forest 0.0000430 full

To make sure that we have the right trade-off between minimizing the number of members and optimizing performance, we can use the autoplot() method:

theme_set(theme_bw())
autoplot(pepe_model_st)

To show the relationship more directly:

autoplot(pepe_model_st, type = "members")

If these results were not good enough, blend_predictions() could be called again with different values of penalty. As it is, blend_predictions() picks the penalty parameter with the numerically optimal results. To see the top results:

autoplot(pepe_model_st, type = "weights")

There are multiple facets since the ensemble members can have different effects on different classes.

To identify which model configurations were assigned what stacking coefficients, we can make use of the collect_parameters() function:

collect_parameters(pepe_model_st, "rand_forest_res")
-->  # A tibble: 60 x 6
-->     member                mtry min_n class terms                            coef
-->     <chr>                <int> <int> <chr> <chr>                           <dbl>
-->   1 rand_forest_res_1_01     3    15 low   .pred_mid_rand_forest_res_1_01   0   
-->   2 rand_forest_res_1_01     3    15 low   .pred_full_rand_forest_res_1_01  0   
-->   3 rand_forest_res_1_01     3    15 mid   .pred_mid_rand_forest_res_1_01   0   
-->   4 rand_forest_res_1_01     3    15 mid   .pred_full_rand_forest_res_1_01  0   
-->   5 rand_forest_res_1_01     3    15 full  .pred_mid_rand_forest_res_1_01   0   
-->   6 rand_forest_res_1_01     3    15 full  .pred_full_rand_forest_res_1_01  3.69
-->   7 rand_forest_res_1_02     3    39 low   .pred_mid_rand_forest_res_1_02   0   
-->   8 rand_forest_res_1_02     3    39 low   .pred_full_rand_forest_res_1_02  0   
-->   9 rand_forest_res_1_02     3    39 mid   .pred_mid_rand_forest_res_1_02   0   
-->  10 rand_forest_res_1_02     3    39 mid   .pred_full_rand_forest_res_1_02  0   
-->  # ... with 50 more rows

This object is now ready to predict with new data!

pepe_pred <-
  pepe_test %>%
  bind_cols(predict(pepe_model_st, ., type = "prob"))

Computing the ROC AUC for the model:

yardstick::roc_auc(
  pepe_pred,
  truth = reflex,
  contains(".pred_")
  )

Looks like our predictions were pretty strong!

How do the stacks predictions perform, though, as compared to the members’ predictions?

We can use the members argument to generate predictions from each of the ensemble members.

pepe_pred <-
  pepe_test %>%
  dplyr::select(reflex) %>%
  bind_cols(
    predict(
      pepe_model_st,
      pepe_test,
      type = "class",
      members = TRUE
      )
    )

pepe_pred
-->  # A tibble: 303 x 8
-->     reflex .pred_class .pred_class_rand_forest_res_1_05 .pred_class_rand_forest_res_1_01 .pred_class_rand_forest_res_1_03 .pred_class_rand_forest_res_1_02 .pred_class_rand_forest_res_1_09 .pred_class_rand_forest_res_1_04
-->     <fct>  <fct>       <fct>                            <fct>                            <fct>                            <fct>                            <fct>                            <fct>                           
-->   1 full   low         full                             full                             full                             full                             full                             full                            
-->   2 low    mid         low                              low                              low                              low                              low                              low                             
-->   3 full   low         full                             full                             full                             full                             full                             full                            
-->   4 low    mid         low                              low                              low                              low                              low                              low                             
-->   5 full   low         full                             full                             full                             full                             full                             full                            
-->   6 full   low         full                             full                             full                             full                             full                             full                            
-->   7 mid    mid         low                              low                              low                              low                              low                              low                             
-->   8 mid    full        mid                              mid                              mid                              mid                              mid                              mid                             
-->   9 low    full        mid                              mid                              mid                              mid                              mid                              mid                             
-->  10 full   low         full                             full                             full                             full                             full                             full                            
-->  # ... with 293 more rows
map_dfr(
  setNames(colnames(pepe_pred), colnames(pepe_pred)),
  ~mean(pepe_pred$reflex == pull(pepe_pred, .x))
) %>%
  pivot_longer(c(everything(), -reflex))
-->      ---->  pivot_longer: reorganized (.pred_class, .pred_class_rand_forest_res_1_05, .pred_class_rand_forest_res_1_01, .pred_class_rand_forest_res_1_03, .pred_class_rand_forest_res_1_02, …) into (name, value) [was 1x8, now 7x3]
-->  # A tibble: 7 x 3
-->    reflex name                              value
-->     <dbl> <chr>                             <dbl>
-->  1      1 .pred_class                      0.0627
-->  2      1 .pred_class_rand_forest_res_1_05 0.865 
-->  3      1 .pred_class_rand_forest_res_1_01 0.838 
-->  4      1 .pred_class_rand_forest_res_1_03 0.832 
-->  5      1 .pred_class_rand_forest_res_1_02 0.845 
-->  6      1 .pred_class_rand_forest_res_1_09 0.855 
-->  7      1 .pred_class_rand_forest_res_1_04 0.868

Voila! You’ve now made use of the stacks package to predict tree frog embryo ear function using a stacked ensemble!